Chinese Text Summarization Based On Thematic Area Detection
نویسندگان
چکیده
Automatic summarization is an active research area in natural language processing. This paper has proposed a special method that produces text summary by detecting thematic areas in Chinese document. The specificity of the method is that the produced summary can both cover many different themes and reduce its redundancy obviously at the same time. In this method, the detection of latent thematic areas is realized by adopting K-medoids clustering method as well as a novel clustering analysis method, which can be used to determine automatically K, the number of clusters.. In addition, a novel parameter, which is known as representation entropy, is used for summarization redundancy evaluation. Experimental results indicate a clear superiority of the proposed method over the traditional non-thematic-area-detection method under the proposed evaluation scheme when dealing with different genres of text documents with free style and flexible theme
منابع مشابه
An Algorithm for One-page Summarization of a Long Text Based on Thematic Hierarchy Detection
This paper presents an algorithm for text summarization using the thematic hierarchy of a text. The algorithm is intended to generate a onepage summary for the user, thereby enabling the user to skim large volumes of an electronic book on a computer display. The algorithm rst detects the thematic hierarchy of a source text with lexical cohesion measured by term repetitions. Then, it identi es b...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملEXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملThe Use of Thematic structure and Concept Identification for Legal Text Summarization
LetSum is a summarization system developed for producing short summaries for legal decisions. LetSum is built with an approach based on the exploration of the document structure and thematic segmentation in order to produce a table-style summary for improving coherency and readability of the text. We present the components of the system and its implementation.
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کامل